CLEVER: clique-enumerating variant finder
نویسندگان
چکیده
MOTIVATION Next-generation sequencing techniques have facilitated a large-scale analysis of human genetic variation. Despite the advances in sequencing speed, the computational discovery of structural variants is not yet standard. It is likely that many variants have remained undiscovered in most sequenced individuals. RESULTS Here, we present a novel internal segment size based approach, which organizes all, including concordant, reads into a read alignment graph, where max-cliques represent maximal contradiction-free groups of alignments. A novel algorithm then enumerates all max-cliques and statistically evaluates them for their potential to reflect insertions or deletions. For the first time in the literature, we compare a large range of state-of-the-art approaches using simulated Illumina reads from a fully annotated genome and present relevant performance statistics. We achieve superior performance, in particular, for deletions or insertions (indels) of length 20-100 nt. This has been previously identified as a remaining major challenge in structural variation discovery, in particular, for insert size based approaches. In this size range, we even outperform split-read aligners. We achieve competitive results also on biological data, where our method is the only one to make a substantial amount of correct predictions, which, additionally, are disjoint from those by split-read aligners. AVAILABILITY CLEVER is open source (GPL) and available from http://clever-sv.googlecode.com. CONTACT [email protected] or [email protected]. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels
MOTIVATION Accurately predicting and genotyping indels longer than 30 bp has remained a central challenge in next-generation sequencing (NGS) studies. While indels of up to 30 bp are reliably processed by standard read aligners and the Genome Analysis Toolkit (GATK), longer indels have still resisted proper treatment. Also, discovering and genotyping longer indels has become particularly releva...
متن کاملLarge Maximal Cliques Enumeration in Large Sparse Graphs
Identifying communities in social networks is a problem of great interest. One popular type of community is where every member of the community knows all others, which can be viewed as a clique in the graph representing the social network. In several real life situations, finding small cliques may not be interesting as they are large in number and low in information content. Hence, in this pape...
متن کاملA Fast Parallel Maximum Clique Algorithm for Large Sparse Graphs and Temporal Strong Components
We propose a fast, parallel, maximum clique algorithm for large, sparse graphs that is designed to exploit characteristics of social and information networks. We observe roughly linear runtime scaling over graphs between 1000 vertices and 100M vertices. In a test with a 1.8 billion-edge social network, the algorithm finds the largest clique in about 20 minutes. For social networks, in particula...
متن کاملWhat if CLIQUE were fast? Maximum Cliques in Information Networks and Strong Components in Temporal Networks
Exact maximum clique finders have progressed to the point where we can investigate cliques in million-node social and information networks, as well as find strongly connected components in temporal networks. We use one such finder to study a large collection of modern networks emanating from biological, social, and technological domains. We show inter-relationships between maximum cliques and s...
متن کاملAn Efficient Algorithm for Enumerating Pseudo Cliques
The problem of finding dense structures in a given graph is quite basic in informatics including data mining and data engineering. Clique is a popular model to represent dense structures, and widely used because of its simplicity and ease in handling. Pseudo cliques are natural extension of cliques which are subgraphs obtained by removing small number of edges from cliques. We here define a pse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 28 22 شماره
صفحات -
تاریخ انتشار 2012